E3£i'^;4VS'«3%gSEi 


mmssff'}. 


Computer  Science  Department 


TECHNICAL  REPORT 


Performance  of  Shared  Memory 
in  a  Parallel  Computer 


Kevin  Donovan 

Technical  Report  498 

March  1990 


NEW  YORK  UNIVERSITY 


:4-V,y 


Department  of  Computer  Science 
Courant  Institute  of  Mathematical  Sciences 


im 


m 


m 


251  MERCER  STREET,  NEW  YORK,  N.Y  10012 


Performance  of  Shared  Memory 
in  a  Parallel  Coinputer 


Kevin  Donovan 

Technical  Report  498 

March  1990 


Performance  of  Shared  Memory 
in  a  Parallel  Computer 

Kevin  Donovan 
March  1990 

Abstract 

Suppose  that  in  a  single  memory  cycle,  n  independent  leuidom  accesses  are  made  to  m  separate  memory 
modules,  with  each  access  equally  likely  to  go  to  any  of  the  memories.  Let  L,,g  then  represent  the  expected 
value  of  the  maximum  number  of  references  to  a  single  memory  module.  Here  we  show  a  new  method  for 
analyzing  this  problem.  It  allows  one  to  efficiently  compute  narrow  upper  cind  lower  bounds  for  L„,  as  a 
function  of  m  and  n.  We  also  determine  the  asymptotic  behavior  of  L„f  as  m  and  n  grow  to  infinity  at  a 
constant  ratio  A  =  n/m.  For  Jiny  A  >  0,  this  paper  proves  that  L»Tg  =  (l  +  o(l))logm/loglogm  as  m  and 
n  — ►  CO.  An  equivalent  result  was  previously  obtained  by  Gonnet  in  coimection  with  a  hashing  problem. 
Using  different  methods,  Gonnet  found  the  asymptotic  value  of  r~'(m)  (plus  lower  order  terms). 

Index  Terms — Asymptotic  behavior,  combinatorial  analysis,  crossbar  networks,  MIMD  computers,  per- 
formance evaluation. 

1  Introduction 

The  particular  application  that  motivated  this  study  is  the  performance  analysis  of  parallel  computers, 
especially  vector  machines  in  which  processors  and  memories  are  connected  by  a  crossbar.  This  means  there 
is  a  communication  path  between  each  processor  and  memory  that  does  not  conflict  with  the  path  between 
any  other  processor  and  memory.  However,  if  a  memory  module  is  addressed  by  more  than  one  processor 
during  an  instruction  cycle,  the  different  accesses  must  be  serviced  sequentially,  and  the  program  cannot 
advance  until  all  memory  requests  are  satisfied.  In  such  a  case,  the  time  to  perform  an  instruction  increases 
linearly  with  the  length  of  the  meiximum  request  queue.  Consequently,  the  hardware  designer  wishes  the 
memory  requests  to  be  spread  as  uniformly  as  possible  on  average. 

In  typical  cases,  the  programmer  controls  how  the  data  are  distributed  across  memory  modules.  Ideally, 
the  programmer  tries  to  structure  the  data  and  references  so  that  no  one  memory  contains  more  than  one 
of  the  items  accessed  in  a  single  instruction  cycle.  E.g.,  if  two  vectors  are  added,  and  the  addition  proceeds 
sequentially  through  the  vector,  then  the  first  item  of  the  vector  can  be  kept  in  the  first  memory  module, 
the  second  in  the  second,  and  so  forth.  Some  vector  operations  do  not  have  such  a  simple  sequence  of 
accesses,  however.  In  order  to  allow  efficient  access  for  a  variety  of  such  sequences,  the  programmer  may  use 
a  hashing  scheme  to  spread  the  data  across  memories.  The  data  may  be  effectively  randomly  distributed 
across  memory  modules,  and  we  wish  to  know  the  expected  length  of  the  maximum  memory  queue  in  such 


As  hardware  costs  go  down,  the  number  of  memory  modules  in  a  system  can  go  up.  Naturally,  the 
behavior  of  maximum  queue  length  in  a  system  with  four  or  eight  memories  will  be  different  &om  that  of  one 
with  64  or  256  or  more.  One  might  wish  to  know  how  the  ratio  of  processors  to  memories  affects  the  queue 
length.  In  this  paper,  we  look  at  what  happens  to  this  value  for  large  numbers  of  processors  and  memories. 

Give  n  processors  and  m  memories,  the  problem  is  anedyzed  here  by  looking  at  the  probability  distribu- 
tion of  the  maximum  memory  queue  length.  We  derive  a  recurrence  relation  for  the  probabilities,  and  find 
inequalities  satisfied  by  the  recurrence.  These  inequalities  allow  us  to  efficiently  determine  narrow  lower  and 
upper  bounds  for  the  probability  distribution  function.  Moreover,  the  upper  and  lower  bounds  approach  a 
common  limit  as  n  and  m  grow  to  infinity  at  a  constant  ratio  A  =  n/m;  By  finding  an  expression  for  this 
limit,  we  determine  the  asymptotic  behavior  of  i^avgi  the  maximum  queue  length. 

We  obtain  the  following  result.  Given  any  positive  rational  number  A,  let  there  be  m  memories  and  Am 
processors,  and  let  m  grow  to  infinity  while  A  stays  fixed.  We  find  that  the  expected  value  Lavg  grows  to 
infinity  with  m.  More  precisely,  for  any  A  >  0,  we  show  that 

iayg  =  ; — ; (l  +  o(l))         as  m  — ►  oo. 

"       loglogm  ^  ' 

The  correction  factor  is  not  necessarily  close  to  unity  for  practical  values  of  m  and  n.  So  we  show  graphs 
of  iavg  for  m  in  the  range  100-100,000,  and  0.25  <  A  <  4.0.  The  graphs  confirm  that  the  growth  rate  of 
iavg  is  approximately  the  same  as  that  indicated  by  the  formula  above,  and  the  graphs  accurately  describe 
the  magnitude  of  iavg  for  this  range  of  m  and  n.  We  eiIso  prove  that  the  probability  distribution  of  the 
maximum  queue  length  becomes  increasingly  concentrated  as  m  and  n  grow  to  infinity  at  a  constant  ratio. 
When  m  and  n  are  large,  the  maximum  queue  length  is  very  likely  to  be  equal  to  either  [iavgj  or  [iavgl  • 
We  also  report  on  simulations  made  to  verify  the  analysis  that  led  to  these  results. 

This  problem  falls  in  the  general  category  of  urn  problems.  A  good  survey  of  related  problems  can 
be  found  in  Johnson-Kotz  [1977].  The  asymptotic  behavior  of  a  similar  problem  was  studied  in  Klamkin- 
Newman  [1967]  and  extended  by  Dwass  [1969].  In  the  terminology  used  here,  they  looked  at  what  happened 
when  the  number  of  memories  was  held  fixed,  and  considered  how  many  references  would  need  to  be  made 
in  order  for  the  expected  value  of  the  maximum  queue  length  to  be  r.  This  was  not  the  growth  pattern  we 
were  interested  in  (we  wanted  to  know  what  happens  when  m  and  n  grow  together),  and  these  papers  did 
not  investigate  the  error  terms  nor  the  speed  of  convergence. 

Flajolet  [1983]  sharpened  and  extended  the  estimates  made  by  these  authors.  The  application  addressed 
by  Flajolet  relates  to  trie  searching,  particularly  the  maximum  depth  of  the  trie  directories.  Like  the  previous 
papers,  Flajolet  [1983]  obtains  the  number  of  memories  as  a  function  of  the  number  of  processors  and  the 
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maximum  queue  length,  whereas  we  obtain  the  queue  length  as  a  function  of  the  number  of  processors  and 
memories.  Otherwise,  the  two  problems  are  the  same. 

However,  Flajolet  [1983]  uses  methods  entirely  different  from  those  we  will  be  presenting.  Here,  we  set 
up  a  recurrence  relation  and  find  inequalities  based  on  the  recurrence.  In  contrast,  Flajolet  [1983]  makes 
use  of  the  fact  that  the  generating  function  is  an  analytic  function,  in  fact  a  polynomial.  He  constructs  an 
integral  whose  value  is  the  solution  being  sought,  and  he  estimates  the  value  of  the  integral.  He  proves  that 
the  error  terms  of  the  estimates  asymptotically  go  to  zero,  and  he  presents  graphicsil  results  showing  that 
one  form  of  the  approximate  solution  gives  close  results  even  when  m  and  n  are  small. 

We  attempted  to  write  a  computer  program  based  on  the  analysis  shown  in  Flajolet  [1983],  and  to 
compare  his  results  with  ours.  We  ran  into  some  difficulties  in  writing  the  program,  however.  We  were  able 
to  get  this  other  approach  to  work  only  for  certain  values  of  m  and  n,  including  the  cases  m  =  n  >  2048, 
and  also  when  m  =  10^  and  n  =  10^.  In  all  cases,  the  results  found  by  the  method  that  will  be  derived  here 
agreed  closely  with  those  found  using  our  implementation  of  Flajolet's  approach.  Our  programs  for  the  two 
different  approaches — Flajolet's  and  that  presented  here — were  about  equally  efficient. 

Finally,  an  equivalent  problem  was  investigated  by  Gonnet  [1983]  with  respect  to  hashing  with  separate 
chedning.  Gonnet  considered  how  long  hash  chains  grow  if  n  hash  items  are  distributed  among  m  hash 
buckets,  with  hash  conflicts  handled  by  chaining.  By  identifying  processors  with  hash  items,  memories  with 
hash  buckets,  and  memory  queues  with  hash  chains,  the  problems  are  immediately  seen  to  be  the  same. 

Several  new  results  are  presented  in  this  paper.  This  appears  to  be  the  first  work  that  looks  at  the 
problem  from  the  point  of  view  of  finding  the  maximum  queue  length  in  terms  of  the  numbers  of  processors 
and  memories,  and  consequently,  the  limit  theorem  we  present  is  new.  The  method  of  analysis  we  present 
here  is  a  new  approach  to  this  problem.  Both  our  approach  and  Flajolet's  are  based  on  estimates  of  the  error 
terms  of  approximations,  and  the  two  derivations  appear  equally  complex.  Gonnet  used  a  function-theoretic 
approach  which  appears  to  be  roughly  as  complex  as  the  other  two  methods;  a  disadvantage  of  his  approach 
is  that  it  looks  only  at  the  average  value,  rather  than  the  probability  distribution  function,  although  it  may 
be  that  Gonnet  simply  chose  to  study  the  more  restricted  problem.  Previous  authors,  including  Flajolet  and 
Gonnet,  commented  on  the  sharpness  of  the  probability  distribution  function,  but  this  appears  to  be  the 
first  paper  that  proves  this  property  to  hold  asymptotically.  Finally,  our  numeric  results  contain  both  upper 
and  lower  bounds,  whereas  previous  authors  had  considered  only  the  asymptotic  behavior. 

In  the  section  following  this  one,  we  define  the  basic  terminology,  and  then  we  present  numerical  and 
graphical  results  of  this  study.  Next,  in  Sections  4-7,  we  derive  the  approximate  method  for  determining  the 
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maximum  queue  length.  This  method  gives  upper  and  lower  bounds,  rather  than  exact  answers.  However, 
the  bounds  are  close,  and  the  running  time  for  each  (tti,  7i)-case  is  oidy  a  few  seconds.  In  the  analysis,  we 
first  obtain  a  recurrence  relation,  in  Section  4.  Then,  in  Section  5,  we  show  that  each  term  of  the  recurrence 
satisfies  certain  inequalities.  Section  6  discusses  how  these  inequalities  can  be  used  in  an  algorithm  to  find 
upper  and  lower  bounds  of  the  expected  maximum  queue  length.  Afterwards,  in  Section  7,  the  recurrence 
and  the  inequalities  are  used  to  determine  the  asymptotic  behavior  of  the  probability  distribution  function, 
when  n  and  m  grow  to  infinity  at  a  constant  ratio  A  =  n/m.  The  final  section  of  the  paper  contains  a 
summary  and  conclusions. 

2  Notation  find  Terminology 

We  first  list  some  of  the  standard  mathematical  notation  and  conventions  followed  here.  All  sets  con- 
sidered in  the  paper  are  finite,  and  if  yl  is  a  set,  then  \A\  represents  the  number  of  elements  that  A  contains. 

If  r  is  a  real  number,  then  [r\  represents  the  floor  function  of  r  (greatest  integer  <  r),  and  [r]  =  —  [— rj 
the  ceiling  function.  The  binomial  coefficients  (j[)  are  defined  as  in  Knuth  [1973].  The  falling  factorial 
function  is  represented  by  [r]k,  and  is  defined  by  [r]k  =  r(r  —  1)  •  •  •  (r  —  A;  +  1).  For  any  real  r,  (g)  =  [r]o  =  1, 
and  (l)  =  [r]k  =  0  if  fc  <  0. 

Following  Knuth  [1973],  the  limits  of  a  summation  are  not  shown  if  these  include  all  terms  for  which 
the  defining  expression  is  nonzero.  The  advantage  of  this  practice  is  that  one  need  not  keep  track  of  how 
the  limits  change  when  an  expression  is  substituted  for  the  index  variable. 

If  /  is  an  expression,  then  Df  is  the  derivative  with  respect  to  the  independent  variable,  which  is 
normally  t  here.  Z?"/  is  the  nth  derivative,  and  D°f  =  /.  The  symbol  "log"  refers  to  the  natural  logarithm: 
log  X  =  log,  X  . 

There  now  comes  the  terminology  relevant  to  this  particular  application.  First,  the  three  variables  that 
appear  most  frequently: 

m  -  Number  of  memories  (or  letters  in  alphabet;  see  below), 
n  -  Number  of  processors  (or  length  of  word), 
r  -  Maximum  references  to  any  one  memory  (or  maximum  repetitions  of  any  letter  in  a  word). 

Since  any  processor  can  access  any  memory  independently,  there  are  m"  diff"erent  access  patterns  of 
processors  to  memories. 

The  standard  terminology  for  a  problem  of  this  sort,  as  in  Johnson-Kotz  [1977],  is  to  speak  of  randomly 
distributing  n  balls  into  m  urns  or  boxes.    In  these  terms,  the  quantity  of  interest,  here  called  r.  is  the 
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maximum  number  of  bails  in  any  single  urn.  Another  equivalent  representation  that  will  sometimes  be  used 
in  this  paper,  is  that  of  words  of  length  n,  with  letters  drawn  randomly  from  an  alphabet  containing  m 
letters.  We  are  interested  in  the  maximum  number  of  occurrences  of  any  of  the  m  letters.  The  reason  for 
using  words  rather  than  processors  and  memories  is  that  it  is  easier  to  display  a  word  than  it  is  to  draw 
a  diagram.  Consequently,  much  of  the  analysis  in  this  paper  will  be  phrased  in  terms  of  the  word  model, 
although  we  wUl  always  point  out  what  the  corresponding  concept  is  in  terms  of  processors  and  memories. 

As  an  example,  suppose  n  =  3  and  m  =  2.  Let  the  processors  be  numbered  1,  2,  and  3,  and  label  the 
two  memories  a  and  b.  The  word  baa  corresponds  to  the  access  pattern  where  the  first  processor  accesses 
memory  b,  and  the  other  two  processors  access  memory  a.  The  total  number  of  words  is  m"  =  8;  they  are 
{aaa,  aab,  aba,  abb,  baa,  bab,  bba,  bbb}.  Of  these,  aaa  and  666  each  contain  a  letter  that  occurs  a  maximum  of 
three  times,  so  that  the  quantity  r  we  are  interested  in  is  equal  to  3.  In  the  other  six  words,  either  a  or  6 
occurs  a  maximum  of  two  times,  and  the  quantity  of  interest  is  2. 

In  the  analysis  that  follows,  we  wUl  be  using  a  cumulative  version  of  this  quantity.  Using  the  word 
representation,  we  wUl  be  looking  at  the  number  of  words  for  which  the  maximum  number  of  occurrences 
of  any  letter  is  less  than  or  equal  to  r  (rather  than  just  equal  to  r).  Because  it  is  frequently  used,  we  give  a 
name  to  this  quantity: 

Q(n,  771,  r)—  Number  of  words  containing  n  letters,  drawn  from  an  77i-letter  alphabet,  for 
which  no  letter  occurs  more  than  r  times. 

In  terms  of  processors  and  memories,  Q{n,m,r)  is  the  number  of  ways  that  n  processors  can  access  t7x 
memories,  with  no  more  than  r  references  to  any  one  memory.  We  are  assuming  that  all  access  patterns 
are  equally  likely.  Consequently,  if  we  divide  by  the  total  number  of  access  patterns  tti",  then  the  quotient 
Q{n,  771,  r)/Tn^  is  the  probability  that  there  are  no  more  than  r  accesses  to  any  one  memory.  In  other  words, 
if  we  can  characterize  Q{n,m,r),  then  our  problem  is  solved.  This  is  in  fact  what  we  will  be  doing  in  the 
later  sections  of  the  paper. 

Continuing  with  the  above  example  where  7i  =  3  and  77i  =  2,  we  have  Q(3,  2, 1)  =  0,  since  no  such  word 
contains  only  a  single  a  and  a  single  6.  (3(3,  2,  2)  =  6,  because  there  are  six  words  for  which  a  or  6  occurs 
twice,  and  (5(3,  2,  r)  =  8  if  r  >  3,  because  there  a  total  of  eight  words,  and  in  all  eight  of  them,  no  letter 
occurs  more  than  three  times. 

3  Data  and  Analysis 

Before  presenting  the  derivation  of  the  calculations,  we  show  numerical  and  graphical  results. 
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In  Figures  1-4  there  appear  graphs  plotting  the  results  for  104  different  values  of  m,  the  number  of 
memories.  The  variable  m  runs  from  16  to  120194,  with  equal  spacing  on  a  logarithmic  scale;  each  new  value 
of  771  is  2^^^  times  bigger  than  the  previous  value.  Since  77i  is  an  integer,  we  round  each  value  to  the  nearest 
integer. 

We  made  three  different  sets  of  runs  for  these  values  of  m.  Letting  n  be  the  number  of  processors  and 
A  =  7i/77x  the  ratio  of  processors  to  memories,  we  made  runs  for  A  =  0.25,  1.00,  and  4.00.  For  the  case 
A  =  0.25,  since  n  must  be  an  integer,  we  set  n  =  [m/^  +  0.5j.  In  each  cjise,  we  computed  upper  and  lower 
bounds  to  the  expected  maximum  queue  length,  and  to  the  probability  distribution  function. 

The  bounds  for  the  expected  maximum  queue  length  is  shown  in  Figure  1  on  a  semilogarithmic  scale. 
For  each  value  of  A,  the  graph  is  close  to  a  straight  line,  which  indicates  that  the  maximum  queue  length 
increases  at  a  logarithmic  rate  with  increasing  77i  for  this  range  of  values.  In  fact,  we  will  be  showing  in 
Section  7  that  the  rate  of  growth  is  o(log77i)  as  77i  — *  oo,  which  would  correspond  to  a  curve  whose  slope 
decreases  with  increasing  m.  For  the  two  higher  pairs  of  curves,  the  decrease  in  slope  is  visible  for  A  =  1.0 
or  A  =  4.0,  but  the  effect  is  very  slight  for  this  range  of  numbers. 

It  is  apparent  from  this  graph  that  the  lower  and  upper  bounds  become  closer  as  tti  increases.  For 
771  >  2000,  the  curves  almost  overlap  when  drawn  to  the  scale  of  Figure  1. 

The  next  three  figures  show  pairs  of  three-dimensional  curves  obtained  iioia  the  same  data.  Each  pair 
of  curves  shows  the  upper  bound  of  the  probability  distribution  function  for  a  different  value  of  A.  The  first 
of  these  pairs,  Figure  2,  shows  data  for  A  =  0.25.  The  left  axis  indicates  the  number  of  memories,  on  a 
logarithmic  scale.  The  &ont  axis  is  the  same  as  the  variable  we  have  been  calling  r,  drawn  to  linear  scale. 
It  represents  the  length  of  the  maximum  memory  queue.  The  vertical  axis  is  linear  with  probability. 

For  example,  in  the  upper  graph  in  Figure  2,  the  probability  distribution  function  for  77i  =  16  is  shown 
at  the  back  of  the  graph.  From  the  graph,  the  case  r  =  1  is  most  likely;  i.e.,  if  tti  =  16  and  A  =  0.25  (so  that 
71  =  4),  then  we  can  expect  the  length  of  the  maximum  queue  to  be  one  in  the  majority  of  cases.  On  the 
same  graph,  the  case  tti  =  120194  is  shown  at  the  front  of  the  curve.  Here,  we  see  that  the  most  likely  cases 
are  r  =  3  or  r  =  4,  with  the  latter  being  a  little  more  likely. 

The  bottom  curve  of  Figure  2  shows  the  same  data  as  the  top  curve,  but  drawn  from  a  different 
perspective.  We  have  lowered  our  angle  of  sight,  and  we  are  looking  directly  at  the  two-dimensional  cross 
section  showing  results  for  m  =  120194.  Figure  3  is  just  like  this  previous  curve,  but  with  results  for  A  =  1.0. 
Similarly,  Figure  4  shows  data  for  A  =  4.0.  The  important  characteristic  of  all  three  graphs  is  the  sharpness 
of  the  peak  of  the  probability  density  function.  The  distribution  is  concentrated  at  one  or  two  values  of  r  if 
A  <  1,  and  at  approximately  five  values  of  r  when  A  =  4.0. 
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16384 

16384 

6.92 

6.92 

6.62 

32768 

32768 

7.26 

7.27 

6.96 

65536 

65536 

7.56 

7.57 

7.29 

1024 

512 

7.48 

7.48 

7.46 

7.58 

6.71 

1024 

128 

16.22 

16.16 

16.45 

13.10 

128 

1024 

2.27 

2.26 

2.27 

2.49 

10® 

10« 

11.63 

11.63 

11.41 

10^ 

10® 

3.34 

3.34 

3.37 

10® 

10^ 

157.34 

157.47 

148.45 

Table  1.  Maximum  Expected  Queue  Length  as  Calculated  in  Different  Ways 


In  Table  1,  we  show  a  selection  of  more  complete  data.  The  seven  headings  indicate  the  following. 
"Procs"  and  "Mems"  are  the  numbers  of  processors  and  memories,  respectively.  The  remaining  columns 
each  show  the  expected  value  of  the  maximum  queue  length,  when  calculated  in  different  ways.  The  column 
labeled  "Simul"  contains  data  obtained  from  a  simulation.  "FFT"  is  the  value  calculated  using  the  fast 
Fourier  transform  to  find  the  desired  coefficient  of  the  generating  function.  This  method  of  solution  does  not 
use  any  approximations,  and  so  the  numbers  in  this  column  are  exact  except  for  rounding  error.  "Low"  and 
"High"  are  lower  and  upper  bound  of  the  average  queue  length,  calculated  according  to  the  algorithm  that 
is  derived  in  this  paper.  Finally,  Va  is  an  easily  calculated  "asymptotic"  approximation  to  the  true  value;  it 
is  defined  in  Section  7.  A  blank  entry  in  a  column  indicates  the  corresponding  item  was  not  caJculated  for 
this  set  of  n  and  tti.  Since  the  calculation  of  the  FFT  values  is  exact  except  for  rounding  error,  it  must  be 
(and  is)  cdways  within  the  range  of  values  labeled  Low  and  High. 

We  pass  now  to  the  derivation  of  the  algorithm  from  which  these  numbers  were  calculated. 

4  Recurrence  Relation 

The  next  several  sections  of  the  paper  develop  an  approach  to  this  problem  that  gives  an  efficient 
approximate  solution,  as  well  as  the  asymptotic  behavior  of  the  probability  distribution  function  of  the 
maximum  queue  length. 

As  a  first  step  in  the  analysis,  we  find  a  generating  function  for  Q(n,m,r),  which  in  this  case  is  a 
polynomial  that  can  be  algebraically  manipulated.  Working  with  the  generating  function,  we  find  a  recurrence 
relation  for  Q(n,  m,  r),  or  rather  for  a  normalized  variant  of  it,  here  called  P(n,  m,  r).  The  recurrence  takes 
the  form  of  a  sum  of  terms,  each  of  which  contains  two  nonnegative  factors.  In  the  section  that  follows  this 
one,  we  prove  that  these  factors,  and  hence  each  term  of  the  recurrence,  satisfy  certain  inequalities.  By  using 
these  inequalities  in  the  recurrence  relation,  we  are  able  to  take  one  of  the  two  factors  of  the  summation 
outside.  In  most  cases  we  are  interested  in,  the  factor  that  remains  is  very  small  except  for  a  few  terms. 
Consequently,  we  need  consider  only  a  few  terms,  and  we  can  use  the  inequalities  to  estimate  the  terms  that 
are  omitted.  We  are  able  to  obtain  both  upper  and  lower  bounds  on  the  true  value.  Section  6  shows  how 
this  is  done  in  the  computer  algorithm. 

Then,  in  Section  7  we  take  the  inequalities  and  pass  to  the  limit.  There  are  various  ways  that  n  and/or 
771  can  grow  to  infinity.  The  case  we  study,  and  obtain  a  limit  theorem  for,  is  where  n  and  771  grow  together 
at  a  constant  ratio  A  =  77/771.  We  obtain  the  results  that  have  already  been  described  in  the  introduction  to 
this  paper. 


Many  books  discuss  generating  functions,  including  Knuth  [19731.  The  generating  we  will  now  derive 
has  also  been  used  by  previous  authors  working  on  this  problem,  including  Flajolet  [1983]  among  those  who 
are  cited  here. 

For  the  derivation  of  the  generating  function,  let  ik  represent  the  number  of  accesses  to  the  jfcth  memory. 
Since  each  of  the  n  processors  accesses  a  memory,  the  sum  of  the  ik  must  be  equal  to  n.  If  none  of  the 
memories  is  accessed  by  more  than  r  processors,  then  all  of  the  1^  are  less  than  or  equal  to  r.  Furthermore, 
if  ijt  =  /,  then  the  I  processors  accessing  the  kth  memory  can  be  chosen  in  (")  diiferent  ways.  If  all  of  the 
ik  are  specified,  then  the  number  of  ways  that  they  can  be  chosen  is  equal  to  the  multinomial  coefficient 
that  appears  in  the  formula  below.  The  total  number  of  such  access  patterns,  Q{n,m,r),  is  then  found  by 
summing  over  all  combinations  of  the  ij,  that  satisfy  the  constraints: 

»1+»3H |-»m='>  «1+»3H |->m=n 

0<»i,<r,V«:  0<i,,<T,Vk 

By  the  multinomial  theorem,  the  Q(n,  m,  r)  appear  as  the  coefficients  of  the  following  polynomial: 


•>o 

If  this  expression  is  evaluated  at  <  =  0,  only  the  constant  term  (3(0,  m,  r)  remains.    Evaluating  the 

first  derivative  at  <  =  0  gives  Q(l,m,r),  and  in  general,  Q(n,Tn,r)  can  be  obtained  by  evaluating  the  nth 

derivative  at  t  =  0.    Inasmuch  as  our  analysis  is  based  on  the  generating  function,  we  define  Q  by  this 

relation.  In  addition,  we  define  the  function  to  be  zero  if  any  of  the  arguments  is  negative.  The  reason  for 

extending  the  definition  in  this  way  is  that  it  allows  us  to  write  summations  without  bothering  to  explicitly 

show  the  limits  of  summation. 

(  0  if  71,  m,  or  r  <  0; 


Q{n,m,r)^{^^(^^^^^      ^t^ 


otherwise. 


These  last  two  equations  show  the  generating  functions  for  Q(n,  m,  r).  The  recurrence  relation  that 
will  be  given  here  is  not  in  terms  of  Q{n,m,r),  but  rather  uses  a  normalized  version  of  this  quantity, 
called  P{n,Tn,r).  Whereas  Q{nym,T)  represents  the  numfter  of  ways  that  something  can  occur,  P(n,m,r) 
represents  the  fraction  (or  probability)  of  cases  for  which  this  occurs. 

P(TL,m,r)—   Fraction  of  words  containing  n  letters,  drawn  from  an  m-Ietter  alphabet,  for 
which  no  letter  occurs  more  than  r  times. 

As  before,  we  define  the  function  over  all  the  integers  by  setting  it  to  zero  for  negative  arguments.  We 
also  need  to  define  what  happens  when  m  =  n  =  0  in  a  way  consistent  with  the  recurrence  formula  we  will 


be  deriving. 


P(n, m,  r) 


Q{n,m,r) 


if  n,  771  or  r  <  0; 

if  771  =  71  =  0  and  r  >  0; 

otherwise. 


The  function  we  call  F  is  the  weighting  factor  in  the  recurrence  relation.  A  considerable  part  of  the 
analysis  that  follows  is  given  to  investigation  of  the  properties  of  this  function.  F  has  four  arguments,  aU 
of  them  integers.  The  definition  is  adjusted  so  that  F(k;n,m,r)  =  0  whenever  F  is  the  weighting  factor  of 
something  that  is  itself  equal  to  zero.  The  function  is  defined  by: 


i''(A!;  n,  m,  r)  =  < 


0 


r7i"(r!)'" 

7n\      [n]kT 


,-k 


n  —  kr 


if  71,  771  or  r  <  0,  or  rm  <  7i; 
ii  Tn  =  k  and  n  =  kr; 

otherwise. 


(2) 


,kj  77i*'(r!)* 
(The  second  case  above  is  the  same  as  the  last  one,  if  we  make  the  interpretation  0°  =  1.) 

Lemma  1.   Let  r  >  1.  Then  P{Ti,m,r)  satisfies  the  following lecuiience  relation: 

P{n,  771,  r)  =  2_]  Fi^'t  n,  m,  r)  ■  P{n  —  kr,  m  —  k,T  —  1). 


(3) 


Proof.    Use  the  definition  (I)  of  Q,  along  with  the  binomial  theorem  and  Leibniz's  rule  for  the  7ith 
derivative  of  a  product.  There  results 


Q{n,m,r)=.D-  (|l  +  i.  +  . . .  +  ^^ 


m\   ff 


=  ^"El   .    1  l-l     (1+T7  +  ---  + 


(=0 

i 
1! 


vr-l 


(r-1)! 


=  E 


rllt 


x-k 


At  <  =  0,  the  factor  £>'(<'='■)  =  0  if  Z  ^  kr,  and  D''^  (t'"')  =  {kr)\.  Moreover,  0"-'(l  +  (i/l!)  +  ...+ 
(t''~'^ /{r  -  1)!))  |(=o  =  <?("  -  /,  771  -  A;,  r  -  1)  from  the  definition  of  Q.  Using  this  in  the  last  equation 
gives 

Qin,  m,  r)  =  X:  (T)  [i)  |^Q(n  -kr,m-k,r-l) 


E 


7j  (7!y^^(" -'='■•'" -^''■- ^)- 


Although  this  sum  is  formally  an  infinite  series  in  k,  in  fact  only  a  finite  number  of  terms  are  nonzero. 
If  A:  <  0  or  fc  >  771,  then  the  factor   C^)   is  equal  to  zero,  and  if  7i  <  ;tr,  then  [n]kr  =  0.    Note  that,  if 
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n  —  kr  =  m  —  k  =  0,  then  Q{n  —  kr,m  —  k,r  —  1)  =  1,  so  that  the  term  corresponding  to  the  last  expression 
above  reduces  to  [n]kT / (r'-)''  =  n!/(r!)™.  When  we  next  divide  by  m",  this  case  is  in  accord  with  the  second 
line  of  the  definition  (2). 

Now  substitute  the  relation  Q(n,m,r)  =  TnT(n,m,r)  into  both  sides  of  the  last  equation.    After 
dividing  through  by  m",  we  get 


,     ,  ,      m 

k 


which  is  equivalent  to  (3).  | 

There  are  several  obvious  properties  satisfied  by  the  function  F  that  will  be  used  in  subsequent  discus- 
sion. We  list  them  here  without  proof. 

Note  first  of  all  that  F(k;n,m,r)  >  0  everywhere.  Furthermore,  for  fixed  n,  m,  and  r,  the  set  of  A;  for 
which  F(k;n,m,r)  >  0  is  a  subsequence  of  the  integers,  without  any  gaps.  A  similar  statement  holds  if  fc, 
71,  and  m  are  fixed,  and  r  varies.  Finally,  if  A;  =  0  and  n,Tn,r  >  0,  then  by  inspection  all  the  factors  in  (2) 
are  equal  to  one,  so  that  F(0;  n,m,r)  =  1  as  well. 

In  the  next  section,  we  will  derive  inequalities  satisfied  by  the  recurrence,  and  later  use  these  to  obtain 
a  limit  theorem  concerning  the  growth  of  the  expected  maximum  queue  length  when  n  and  m  grow  large. 

5  Inequalities 

The  recurrence  that  is  shown  in  Lemma  1  appears  too  complicated  to  solve  exactly  in  an  efficient 
manner.  In  this  section,  we  show  that  the  terms  of  the  recurrence  obey  certain  inequalities.  This  will  allow 
us  to  approximate  the  factor  P{n  —  kr,  m  —  k,r  —  1)  by  an  expression  (namely,  P(n,  m,T  —  1))  that  can  be 
taken  outside  the  summation  in  the  recurrence  formula.  The  inequalities  also  permit  us  to  bound  the  error 
that  results  when  we  drop  most  of  the  terms  of  the  summation.  Together,  these  two  simplifications  lead  to 
an  efficient  algorithm  for  estimating  the  probability  distribution  of  the  maximum  queue  length. 

Before  stating  the  inequalities,  we  recall  a  pair  of  definitions  we  will  be  using.  A  sequence  of  numbers 
{ao,  ai, . .  .On}  is  called  unimodal  if  it  has  a  single  maximum;  i.e.,  if  there  exists  an  index  M  such  that 
ao  <  ai  <  •  •  •  <  a^f _i  <  aM  >  ^M+i  >  •  •  •  >  ^n- 

A  sequence  of  nonnegative  numbers  {ao,ai, .  .  .o„}  is  called  log-concave  if 

a|  >  aj_iaj+i      for  j  =  1, . . . ,  n  —  1. 

Notice  that  a  log-concave  sequence  is  necessarily  unimodal.  This  is  because  a  non-unimodal  sequence 
must  contain  a  value  a,  that  is  a  local  minimum,  with  a,  strictly  less  than  at  least  one  of  its  neighboring 
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elements,  and  the  sequence  does  not  satisfy  the  definition  of  being  log-concave  for  that  aj.  For  our  purposes, 
the  most  important  property  of  a  log-concave  sequence  is  that,  if  the  sequence  is  also  decreasing,  then  it 
decreases  faster  than  a  geometric  sequence — i.e.,  super-exponentially.  To  see  this,  assume  that  all  the  aj  in 
the  definition  are  positive,  and  ao  >  Oi  >  •  •  •  >  a„.  Then,  applying  the  above  definition  for  j  —  1,2,...,  ra—  1: 

01^02  On 


a-0        ai  On-i 

If  we  replace  edl  the  inequalities  in  this  expression  by  equal  signs,  then  we  have  a  geometric  sequence. 

The  proofs  of  the  inequalities  are  rather  lengthy  and  technical.  For  this  reason,  we  first  state  them,  and 
follow  with  the  proofs.  The  reader  may  wish  to  skip  over  the  proofs  on  a  first  reading.  Among  the  results 
shown  here,  only  Propositions  4,  7,  and  9  are  used  later.  Proposition  4  is  referenced  in  the  next  section,  which 
describes  the  computer  algorithm  based  on  these  inequalities.  It  allows  the  factor  P{n  —  kr,Tn  —  k,r  —  I) 
in  (3)  to  be  replaced  by  P{n,m,r  —  1);  the  latter  is  independent  of  k,  and  so  can  be  taken  outside  the 
summation.  Propositions  7  and  9  pertain  to  the  other  factor  in  (3),  F{k;n,m,r).  Roughly  speaking,  they 
state  that  F{k;n,m,T)  decreeises  rapidly  with  increasing  k  and  r.  These  two  results  are  used  in  the  proofs 
of  the  limit  theorem  in  Section  7.  Each  proposition  is  preceded  by  one  or  two  lemmas  used  in  the  proof  of 
the  proposition.  The  lemmeis  themselves  are  not  used  outside  of  this  section. 

Lemma  2.   Let  n  >  1.  Then  P{n,  m,  r)  <  P{n  —  1,  m,  r). 

Lemma  3.    Let  t  >  0,  n  >  r  and  m  >  1.  Then  P{n,  m,  r)  <  P{n  —  r,m  —  1,  r). 

Proposition  4.    Given  r  >  1,  then  foi  aJI  k  in  the  range  0  <  fc  <  min(m,  [n/rj); 

P{n,  m,  r  —  1)  <  P[n  —  kr,  m—  k,r  —  1). 

Lemma  5.   Let  r  >  1.  Then  F{k;  n,  m,  r)  <  F{k;  n,  m,  r  -  1). 

Lemma  6.   As  a  function  of  r,  the  sequence  of  values  F(k;  n,  m,  r)  is  log-concave. 

Proposition  7.  Suppose  n,  m,  and  k  are  held  fixed,  and  r  varies.  Then  the  set  of  values  F{k;n,m,r) 
decays  to  zero  at  a  super-exponential  rate. 

Lemma  8.  Suppose  n,  m,  and  r  are  held  fixed,  and  k  varies.  Then,  as  a  function  of  k,  the  sequence 
F(k;  n,  m,  r)  is  unimodal.  In  fact,  given  n  and  m,  the  sequence  is  log-concave  everywhere  with  the  exception 
of  at  most  one  pair  (k,  r),  the  exception  occurring  when  k  =  m  and  r  =  n/m. 

Proposition  9.  Suppose  n,  m,  and  r  are  such  that  0  <  F(l-n,m,r)  <  1,  and  that  r  :^  n/m.  Then,  as  a 
function  ofk,  F(k;  n,  m,  r)  decays  monotonically  to  zero  at  a  super-exponential  rate. 
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Now  we  come  to  the  proofs.  The  proofs  of  the  first  two  lemmas  do  not  use  the  recurrence  relation  but 
rather  work  directly  on  the  properties  of  the  sets  whose  cardinalities  are  being  compared. 

Suppose  A  and  B  are  two  finite  sets,  and  there  is  a  map  f:A—>B.  By  the  definition  of  a  map,  for  each 
X  £  A,  the  map  /  associates  exactly  one  y  =  f(x)  E  B.  Using  standard  notation,  let  f~^(y)  be  the  set  of 
aU  z  €  ^  for  which  f(x)  =  y.  Then  y-^,  ^  y2  ^  f'Hvi)  H  f^iyi)  =  0,  and  Uygs  f'Hv)  =  ^-  This  leads  to 
the  following  estimate  of  the  size  of  A  relative  to  the  size  of  B: 

\A\<\B\.max\r'{y)\. 

y€B 

In  order  to  make  use  of  this  argument  in  the  next  two  proofs,  it  is  worthwhile  to  have  a  name  for  the 
set  containing  Q(n,  m,  r)  words  that  was  described  earlier. 

W{n,m,r)—  Set  of  words  containing  n  letters,  drawn  from  an  m-letter  alphabet,  for  which 
no  letter  occurs  more  than  r  times. 

As  might  be  expected,  we  define  W(n,  m,  r)  to  be  the  null  set  if  n,  m,  or  r  <  0.  If  n  =  m  =  0,  and 
r  >  0,  then  W{n^  m,  r)  is  a  set  whose  sole  member  is  the  "empty  word".  In  all  cases,  we  have  Q(n,  m,  r)  = 
\W{n,Tn,r)\. 

Proof  of  Lemma  2.  The  lemma  states  that  if  we  increase  the  number  of  processors,  and  hold  the  number 
of  memories  constant,  then  the  expected  maximum  number  of  references  per  memory  goes  up  rather  than 
down. 

By  representing  P  in  terms  of  Q,  it  can  be  seen  that  the  statement  to  be  proved  is  equivalent  to 
Tn~"Q(n,  771,  r)  <  Tn~^^~^^Q(n  —  1,  m,  r),  or  Q{n^  ni,  r)  <  m  •  Q(rL  —  1,  m,  r).  By  the  remarks  made  above,  it 
suffices  to  find  a  map  /:  W(n,m,,r)  —>  W{n  —  l,Tn,  r)  for  which  \f~^{y)\  <  m  for  all  y  G  W(n—  l,m,  r).  We 
let  /  be  the  map  that  truncates  the  last  letter  from  the  word.  This  letter  can  be  chosen  in  at  most  m  ways, 
so  \f-'(y)\  <m.  , 

Proof  of  Lemma  3.  This  is  equivalent  to  7n"''<5(n,  m,  r)  <  (m  —  l)~*"~'''Q(n  —  r,  m  -  1,  r),  or 


(^   n-T 
I         m^  Q{n  —  r,m,— \,r). 
m-lj 


It  is  sufficient  to  show  a  map  /:  W{n,  m,  r)  — »  W{n  ~  r,m  —  1,  r)  for  which  \f~^{y)\  <  (m./{m  —  1))        m'' 
for  all  y  G  W(n  —  r,  m  —  1,  r). 

Let  a;  =  ai  ...a,  G  W(n,m,r),  and  let  f{x)  =  bi  .  .  .b^n-r)  €  W{n  -  r,m-  l,r).  By  the  definitions 
of  their  respective  sets,  the  letters  a,  are  drawn  from  an  m-letter  alphabet,  and  the  6,  are  drawn  from  an 
{m  -  l)-letter  alphabet.  Without  loss  of  generality,  assume  these  two  alphabets  are  identical  except  for  the 
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letter  Z,  which  is  present  only  in  the  larger  alphabet.  Call  the  first  n  —  r  letters  a\  . .  .a(„_T)  the  left  ■part  of 
z,  and  the  remaining  letters  a(„_,+i) . .  .a„  the  right  part.  Assume  the  left  part  of  x  contains  j  occurrences 
of  the  letter  Z,  and  the  right  part  contains  k  occurrences  of  Z.  Since  x  €  W{n,  m,  r),  then  by  definition 
j  +  k<r. 

The  map  /  can  be  viewed  as  the  composition  of  two  maps.  The  first  map  takes  the  k  Z's  in  the  right 
part  and  moves  them  to  the  end  of  the  word,  without  changing  the  relative  order  of  the  other  r  —  k  letters 
in  the  right  part  of  x.  The  second  map  then  takes  the  j  Z's  in  the  left  part  of  y,  and  exchanges  them  with 
the  first  j  letters  of  the  right  part  of  y  (none  of  which  is  a  Z),  without  changing  the  relative  order  of  the 
letters  moved.  f(y)  then  consists  of  the  first  n  —  r  letters  of  the  following  word: 


no  Z^s  no  Z's 


j  times  k  times 

For  a  given  y  G  W{n  —  r,m  —  l,r),  the  k  Zh  that  are  moved  by  /  can  be  chosen  in  (^)  ways.  The 
j  Z''s  on  the  left  can  be  chosen  in  (""')  ways.  There  are  an  additional  r  —  j  —  k  letters  on  the  right  that 
are  not  Z^s;  each  can  be  chosen  independently  from  the  remaining  m  —  1  letters,  so  they  contribute  a  factor 
<  (m  —  1  )'■"■'"*  (this  is  an  overestimate,  since  certain  choices  of  these  letters  may  cause  a  single  letter  to 
occur  more  than  r  times).  To  find  the  toted  number  of  elements  in  the  inverse  image  f~^(y),  we  sum  over 
all  combinations  of  j  and  A;,  using  the  binomial  theorem. 

s(™-')'E(";')""'-""'''s  (I)  «"■-""'>' 

This  satisfies  the  condition  given  at  the  beginning  of  the  proof.  | 

Proof  of  Proposition  4-  For  fc  =  0,  there  is  nothing  to  prove.  If  A;  =  1,  then  by  the  two  previous  lemmas, 
P(n,  m,  r  -  1)  <  P(n  -  1,  m,  r  -  1)  <  P(n  -  1  -  (r  -  1),  m  -  1,  r  -  1)  =  P(n  -  r,  m  -  1,  r  -  1).  For  ik  >  1, 
the  result  follows  by  induction.  | 

Proof  of  Lemma  5.  If  r  >  1,  and  F{k;  n,m,r-l)  =  0,  then  it  follows  immediately  from  the  definition  of  P 
that  F(k;  n,  m,  r)  =  0  as  well.  Assume  then  that  F(k;  n,m,r—  \)  >  0.  Since  the  proposition  follows  trivially 
ii  F{k;n,m,r)  =  0,  assume  F{k;n,m.,r)  >  0  also.  Using  (2),  and  omitting  factors  that  are  independent  of 
r  (since  their  ratio  is  one),  we  have: 

F(k-n,m,r)       _  mM-M((.  -  1).)^      [^],^        /,^  _  jk  N  [(n-^O-(n-Mr-l))] 


F{k;n,m,r-1)  m*'-(r!)*  Mjfe(r-i)  \     m 
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[n—k{r  —  l)]ii   /m—k\  /  n  \* 


\     m     J  \TmJ 


We  are  assuming  F{k\  n,  r,  m)  >  0.  However,  by  the  way  in  which  F  was  defined,  this  can  occur  only  if 
n  <  rm,  so  that  the  last  expression  above  is  <  1.  This  completes  the  proof.  | 

Proof  of  Lemma  6.  We  must  show 

F{k;  n,  m,  r)^  >  F{k;  n,m,r-  1)  F{k;  n,m,r+  1). 

We  have  already  remarked  that  F(k;  n,  m,  r)  >  0  for  any  set  of  arguments.  If  F(k;  n,  m,  r  -  1)  =  0  or 
F(k;  n,m,T  +  1)  =  0,  then  the  statement  is  trivially  true.  If  not,  divide  through  the  above  formula  by  its 
right  hand  side.  We  find 


F{k;n,m,r-l)F{k;n,m,r+l)  (m*'')2 

'(7--l)!(r+l)!V  ([nUr) 


X 


(t'-V  J       W(it-l)rW(it+l)r 

r  +  l\*  (n-fe(r-l))!(n-fe(r+l))! 
{{n-kryy 


>  1. 


Proof  of  Proposition  7.  Immediate  from  the  two  previous  lemmas,  and  the  earlier  remark  that  a  de- 
creasing log-concave  sequence  decays  super-exponentially.  | 

Proof  of  Lemma  8.  Because  of  its  length,  the  proof  is  given  in  the  appendix  at  the  end  of  this  paper.  | 

Proof  of  Proposition  9.  The  conditions  imply  that  n,  m,  and  r  are  all  nonnegative.  Direct  substitution 
into  (2)  with  ^  =  0  shows  that  F(0;n,  m,  r)  =  1.  By  definition,  F(k;n,m,r)  =  0  if  A;  <  0.  Then,  by  the 
previous  lemma,  for  the  given  n,  m,  and  r,  the  function  F  attains  its  maximum  at  k  =  0  and  decreases  for 
all  higher  values  of  k.  The  decay  is  super-exponential  because  the  sequence  is  log-concave.  | 

6  Computer  Algorithm 

The  inequalities  that  were  derived  in  the  previous  section  serve  as  the  basis  of  a  computer  algorithm  to 
determine  upper  and  lower  bounds  of  the  maximum  queue  length  as  a  function  of  m  and  n.  In  this  section, 
we  very  briefly  describe  how  this  is  done. 

Note  first  that  the  basic  recurrence  relation  (3)  implies  the  inequality  P(n,m,r)  >  Ylk  F(k]n,m,r)  ■ 
P\ow{n-kr,  m-k,r-l),  where  Piow  is  any  lower  bound  of  the  function  P.  In  particular,  using  Proposition  4, 

15 


we  have  P{n,  m,  r)  >  ^^  F{k;  n,  m,  r)  ■  P(n,  m,  r  -  1)  =  P{n,  m,  r  -  1)  •  ^j.  F{k;  n,  m,  r).  This  equation  can 
be  turned  around  to  get  an  upper  bound  for  P(n,  m,r  —  1)  in  terms  of  an  upper  bound  for  P(n,  m,  r): 

P{n,m,r-1)<  P{n,m,r)-  i'^F{k;n,m,r)\       <  Phigh(n,  m,  r)  •  I  ^  F(A;  n,  m,  r)  j       ,  (4) 

where  Phigh  is  any  given  upper  bound  of  P,  and  io  >  0  is  any  nonnegative  index  at  which  to  terminate 
the  summation.  We  have  written  a  computer  program  that  uses  this  relation  to  find  an  upper  bound  on 
the  probability  distribution  function  P{n,Tn,r).  The  stopping  index  fco  is  determined  by  terminating  the 
summation  loop  when  F{k;n,Tn,r)  becomes  smaller  than  some  threshold  value. 

To  get  a  lower  bound,  we  use  the  following  expression  based  on  (8)  (whose  simple  derivation  is  given 
later): 

P(n,  m,r  -  1)  >  Piow(w,  "i. '')  -  X!  ^^^'  "'  "^' '')  '  '^'>igh("  -  kr,m,-  k,r  -  1). 

it>i 

For  the  estimate  Phigh(^  ~  kr,m  —  k,r  —  1),  we  can  simply  use  the  obvious  relation  P{n,m,r)  <  1. 
This  is  what  is  done  in  the  following  section,  when  we  find  the  asymptotic  behavior  of  P(n,m,r).  In  the 
computer  program,  this  estimate  gives  rather  crude  results.  To  get  tighter  bounds,  we  use  a  variation  of  (4) 
in  order  to  find  Phigh("  ~  kr,  m  —  r,r  —  1). 

7  Asymptotic  Behavior 

We  now  investigate  the  asymptotic  behavior  of  the  maximum  queue  length.  We  do  so  under  the 
conditions  that  n  and  m  grow  to  infinity  while  maintaining  a  constant  ratio  that  we  call  A  =  n/m.  We 
obtain  the  following  results. 

First,  if  A  >  0,  then  the  expected  maximum  queue  length  L^y^  grows  without  bound  when  n  and  m 
grow  in  this  fashion,  and  iavg  ■  (log  m/  log  log  m)~^  — >  1  as  to  — >  co  for  any  value  of  A.  Secondly,  Xavg  can  be 
found  by  looking  at  the  weighting  function  evaluated  at  A:  =  1.  In  particular,  iavg  is  close  to  the  integer  ri 
for  which  F(l;  n,  m,  ri  —  1)  >  (ri  —  1)^^"  and  F(l;  n,  to,  rj)  <  r^  .  Finally,  the  convergence  of  Lavg  to  ri  is 
almost  certain  in  the  following  sense.  Given  A  and  given  e  >  0,  there  exists  an  toq  such  that  the  probability 
that  the  maximum  queue  length  differs  from  r^  by  more  than  one  is  less  than  e  whenever  to  >  toq.  The 
value  Too  depends  on  both  A  and  e. 

As  we  develop  the  proofs  of  these  statements,  we  will  twice  briefly  digress  to  show  how  they  relate  to 
the  graphs  in  Figures  1-4  that  were  discussed  earlier.  In  particular,  we  will  see  that  the  growth  rate  of  the 
maximum  queue  length,  and  the  fact  that  this  value  is  concentrated  at  only  a  few  values,  is  reflected  in  the 
propositions  to  be  demonstrated  in  this  section. 
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We  start  by  defining  the  values  ro  and  ri  aiound  which  the  maximum  queue  length  clusters. 

ro  =  rj  —  1,      where 

Ti  =  min  {r:  F(l;  n,  m,  r)  <  ^/r]  . 

Since  F(l;  n,  m,  r)  =  0  if  r  >  n,  and  rj  >  A,  the  value  Vi  is  well-defined. 
Before  stating  the  theorem,  we  need  to  establish  several  lemmas. 

Lemma  10.   Let  A  >  0  be  a  rationai  number.   Let  n  and  m  grow  to  infinity  while  maintaining  a  constant 
ratio  A  =  n/m.  Let  tq  and  ri  be  defined  as  above.  Then,  as  m  goes  to  infinity,  tq  and  ri  also  go  to  infinity. 

Proof.  Assume  the  contrary.  Suppose  ri  were  bounded,  say  by  ri  <  R.  Using  the  definition  (2)  of  F, 
evaluated  at  i  =  1,  we  have 

We  are  assuming  that  A  is  constant  and  r^  is  bounded.  This  implies  that,  for  large  m: 

Using  this  in  the  above  formula,  we  have  for  large  m: 

F(l;  n,  m,  ri)  >  0m— e~^  •  min{l,  A^}, 

where  0   =   1  +  o(l).     The  expression  on  the  right  is  unbounded  as  m  — ♦   oo.     By  definition,  however, 
F{\;n,  m,  ri)  <  r^      <  R^^^,  a  contradiction.  Therefore,  ri  must  be  unbounded,  and  so  too  must  ^o  =  rj  — 1. 1 

Lemma  11.    With  the  same  assumptions  as  in  the  previous  lemma,  the  ratio 

F(l;n,m,ri) 

— — ♦  0         as  m  — ♦  oo. 

F{l;n,m.,rQ) 

Proof.  We  substitute  into  the  expression  shown  above  for  F{1;  n,  m,  r): 

F(l;n,m,r+1)  _        r\         mT     (     m     y*"^^'"'  [n],.+i 


0         as  r  — >  oo.  (5) 


F[\\n,m,r)  [r  +  l)\m^+'^   \m  -  \ )  [n] 

1  n  —  r  A        m.  —  r/X 

r  +  l     m  —  1        r  +  1       m,  —  1 

Letting  r  =  ro,  and  using  the  previous  lemma,  gives  this  lemma.  | 
Lemma  12.    Under  the  same  conditions,  ro,ri  =  o(log7n)  as  m  — ►  oo. 

Proof.  We  have 

F{l;n,m,ro)  =  m   ^   '  °.      <  m—-  <  m  {  —  ]      ,  (6) 

mT^ro'.  \     m     J  tq!  \ ''o  / 
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where  the  last  inequality  follows  from  Stirling's  formula.   Using  the  fact  that  F{l;n,m.,ro)  >  Tq' '  >  1  by 
definition,  we  can  take  logarithms  and  rearrange,  obtaining: 

ro  <  ^°S"^  (7) 

log  ro  -  log  A  -  1 

Since  t-q  — ►  oo  as  m  — ►  oo,  the  denominator  on  the  right  is  unbounded,  which  proves  the  statement  for  tq. 
It  then  follows  that  ri  =  o(log  m)  as  well.  | 

We  now  sharpen  these  results  to  find  the  asymptotic  behavior  of  ro  and  ri  as  m  — ♦  oo.  We  have  by 
definition  of  ri  that  y/r^  >  F(l;  n,  m,  rj).  We  expand  F{1;  n,  m,  ri)  in  the  same  way  as  was  done  in  (6).  We 
multiply  through  by  ri!  and  take  logarithms,  obtaining 

-logri  +  (l+o(l))rilogri  =  (l  +  o(l))ri  logn  >logm  +  log  (  ^^  (  ^^^— "  )  )  =  (l  +  o(l))  logm. 

Z  \    TO   '      \       TO       /  / 

We  have  used  a  weakened  version  of  Stirling's  formula,  that  logfc!  =  klogk[l  +o(l)).  We  now  divide 
by  logri  =  0(loglog7n),  to  get 

n>(l  +  o(l))|^>(l  +  o(l))-i^. 
log  ri  log  log  TO 

Since  ro  =  ri  +  1,  this  shows  that  tq  >  (l  +  o(l))  log  m/ loglogm  as  well,  which  implies  that  logro  > 
(l  +o(l))  loglogTO.  Furthermore,  we  have  by  definition  that  -y/rg  <  J^(l;  n,  to,  tq).  This  means  that  we  can 
repeat  the  same  argument  as  above,  with  ro  substituted  for  ri,  and  with  the  sense  of  the  inequality  reversed. 
The  result  is 

/  ,     ,v  log  TO  ,  ,     ,,        log  TO. 

'•o<     l  +  o  1     p^  <     l  +  o  1     — f . 

log  To  log  log  TO 

Putting  this  together,  we  have  proved  the  following. 

Proposition  13.    With  the  same  assumptions  as  made  previously,  for  any  postive  rational  A, 

,                 >     log  TO 
''o,''!  =  II  +  o(l))- — as  m  — »  oo. 

log  log  771 

The  results  of  this  propostion  can  be  compared  against  the  graphs  shown  in  Figure  1.  As  mentioned 
before,  these  graphs  are  plotted  on  a  semilog  scale.  According  to  the  lemma,  the  slopes  of  the  graphs  should 
be  decreasing  as  the  independent  variable  to  increases.  This  effect  can  be  seen  for  the  two  higher  curves, 
corresponding  to  A  =  1.0  and  A  =  4.0,  but  the  effect  is  very  small,  especially  for  the  higher  values  of  to. 
This  is  partly  because  (log  7n)-growth  is  not  too  much  different  from  (log  jn/loglogTO)-growth  for  this  range 
of  numbers.     We  have  also  not  examined  how  fast  tq  and  ri  converge  to  the  asymptotic  value  given  in 
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the  proposition,  but  it  should  be  clear  from  the  proof  that  some  of  the  approximations  are  quite  crude  for 
practical  values  of  m  and  n. 

It  should  be  noted  that  the  same  technique  used  to  prove  the  previous  proposition  can  be  used  to  show- 
that 

for  large  m.  The  expression  on  the  left  was  the  one  obtained  by  Gonnet  [1983]  when  studying  an  equivalent 
problem  connected  with  hash  queues.  The  equation  expresses  the  fact  that,  to  the  order  of  approximation 
shown,  the  two  expressions  are  asymptotically  equal. 

The  limit  theorem  describes  where  the  probability  is  concentrated.  Since  P(n,  m,  r)  is  a  cumulative 
probability  distribution,  the  difference  P{n,Tn,r)  —  P{n,Tn,r  —  1)  is  the  probability  that  the  maximum 
number  of  references  to  any  one  memory  is  exactly  r.  From  Lemma  1,  and  the  fact  that  at  A;  =  0  the 
weighting  factor  F{0;  n,  m,  t)  =  1,  we  get  an  explicit  formula  for  its  value: 

6P{n,  m,  r)  =  P{n,  m,  r)  —  P(n,  m,r  —  1)  =  2_]  F{k;  n,  m,  r)  ■  P{n  —  kr,m—  k,r  —  1).  (8) 

it>i 

We  are  now  ready  to  prove  the  main  result  of  this  section. 

Theorem  14.  Let  A  >  0  be  a  rationaJ  number.  Let  n,  Tn  — ►  oo  at  a  constant  ratio  n/m  =  X.  Let  tq  and  ri 
be  defined  as  in  the  previous  lemmas.  Then 

a)  Under  these  conditions,  Xavg  — •  oo,  where  iovg  is  the  expected  value  of  the  maximum 
queue  length. 

b)  For  any  A  >  0; 

lim       2,      <5P(n,  m,  r)  =  0. 

T<Ta,T>Ti 

Proof.  If  ro  —>  oo,  then  it  follows  from  (b)  that  the  expected  value  of  the  maximum  queue  length 
becomes  infinite  as  well,  so  that  (a)  would  then  be  a  consequence  of  (b).  We  have  already  shown  in  the 
lemmas  that  tq  — ►  oo,  so  it  remains  to  show  that  the  growth  in  the  probability  distribution  P(n,  m,  r) 
becomes  concentrated  around  tq  and  rj. 

We  look  first  at  r  <  ro.  The  proof  follows  easily  from  (4),  taking  as  upper  bound  Phighi'i, '".  ^o)  =  1: 

P(n,  m,  ro  -  1)  <   I  ^  F{k-  n,  m,  ro)  j        <  F{\-n,  m,  ro)"^ 

1 

<  ——^  —>  0  as  ro  — »  oo. 
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Since  tq  — ►  oo  as  m  — »  oo,  this  proves  half  of  (b): 

yj  6P{n,  m,  r)  =  P(n,  m,  t-q  —  1)  — ►  0         as  m  — ►  oo. 

r<ro 

Next  we  look  at  r  >  ri.  It  was  shown  previously,  in  Proposition  7  and  Proposition  9,  that  the  weighting 
function  F  decays  super-exponentiaUy  with  both  k  and  r.  This  allows  us  to  estimate  F(k-  n,  m,  r)  for  r  >  ri. 

1/2 

We  have  first  of  all  that  F(l;n,Tn,,ri)  <  r^      by  definition.  Then,  using  (5): 

ii'(l;7i,77i,r+l)  =  — i^F(l;n,77i,ri)<  -F(l;n,m,ri)  <  —=. 

ri  +  1     771  —  1  ri  y'ri 

Since  F  decays  super-exponentially  with  r,  this  implies,  for  j  >  1, 


F(.;«,™,n  +  .)<(Ay     ;^<(;^) 


F  also  decays  super-exponentially  with  k.  Recognizing  that  F(0;  n,  m,  ri  +  j)  =  1,  and  using  what  what 
just  shown: 


F{k;  n,  771,  n  +  i)  <  -F(l;  n,  m,  n  +  jf  <  {  -^ 


Jk 


We  now  use  all  this  along  with  (8),  taking  as  upper  bound  F{n  —  kr,  m  —  k,r  —  1)  <  1. 
2_]  SP{n,  771,  r)  =  2__,  2_1  ^(^'  "'  "^' '')  ■  -^(^  ~  ^^'  'm—  k,r  —  1) 

T>T,  r>ri  *>1 

A    \^'=      ^  /    A    \^  1 


1  y. /      A      \^ 


A       /         1 


as  ri  —*  oo. 


Since  r;  — ►  oo  as  77i  — »  oo,  this  completes  the  proof.  | 

Part  (b)  of  this  last  theorem  relates  to  the  graphs  shown  in  Figures  2-4.  These  graphs  show  the 
probability  that  the  maximum  queue  length  is  equal  to  a  given  r  as  a  function  of  77i  for  constant  A.  According 
to  the  theorem,  one  would  expect  the  curves  to  have  sharp  peaks,  with  ever-narrower  shape  as  tti  increases. 
The  first  of  these  efiects  is  in  fact  evident  from  the  graphs.  It  may  be  especially  apparent  from  the  bottom 
curves  in  these  three  figures  that  the  probability  is  concentrated  around  a  few  values  of  r  for  each  value  of 
771.  It  is  not  so  clear  from  these  figures  that  the  curves  are  narrower  for  large  tti  than  they  are  for  small  tti. 
The  detailed  numeric  results,  which  we  have  printed  to  six  significant  digits,  do  indicate  that  this  occurs, 
but  the  effect  is  very  small  even  for  the  wide  range  of  values  of  m  shown  here. 
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In  the  form  just  given,  one  has  to  evaluate  F{1\  n,  m,  r)  at  discrete  values  of  r  to  get  an  approximation 
to  iavg-  It  is  often  easier  to  find  the  root  of  a  continuous  function  than  it  is  to  test  a  discrete  function 
at  different  points.  Since  F(l;n,m,ri)  can  be  approximated  by  continuous  functions,  we  get  the  following 
result. 

Corollary  15.    Given  tie  assumptions  of  the  previous  theorem,  the  same  results  hold  iffo  and  fi  are  used 
in  place  ofro  and  rj,  respectively,  where  fo  =  [raj  and  fi  =  \ra\,  and  where  Ta  is  defined  by 


Ta  =  min 


[>-^'^-^'^{t)''-<-^]- 


(9) 


Proof.  We  first  remark  that  for  fixed  m  and  A,  the  above  expression  is  continuous  and  clearly  goes 
to  zero  as  r  — »  oo,  so  r^  is  well-defined.  We  have  shown  previously  that  t\  — >  oo  as  m  — ►  oo,  and  that 
ri  =  o(logTO),  so  that  we  have  the  limits 

r    1  _  /  -i  \  n-~T\  /  _  \    \Tn  —  ri 


as  7n  — ►  oo.  These  were  obvious  when  r-i  was  assumed  bounded.  We  still  have  ((m  —  1)/Tn)'  — ►  e~^  since 
A  is  constant.  To  show  that  the  other  factors  converge  to  one,  we  use  the  formula  log(l  +  x)  =  x  -f-  o(z)  as 
X  — ►  0.  Then,  since  ri  =  o(logm): 

log     ^A-^'      =logLi^>log     i         =rilog L^^^O; 

\  m''  /  n"^'  \      n      J  n  Am 

log(-^)      =rilog(l+^— )  ^-!^-.0. 
\m,  -  \ ;  \        m—\J        m  -  1 

We  also  have  Stirling's  formula:  rj!  =  i?r,(27rri)^''"(ri/e)''',  where  i?,,  — >  1  as  ri  — »  oo.  We  use  these 
limits  to  approximate  F[\;  n,  m,  Ti).  In  the  formula  that  follows,  6  represents  the  correction  factor  to  account 
for  the  approximations.  We  have  just  shown  ^  — ♦  1  as  m  — ►  oo. 


F(l;n,  m,  ri)  =  m— - 


n|r,     /  m  —  1  ^  ' 


I 


^     1       m     fXeV'       ^  ^ 

V2tc  Vn  V  '■i  / 


The  inequality  is  true  by  definition  of  ri.  If  we  divide  through  by  r/  and  compare  with  (9),  we  find 
that  Ta  —>  Ti,  except  for  two  differences.  First,  ri  is  defined  to  be  an  integer,  whereas  Va  need  only  be  real. 
Secondly,  there  is  the  factor  of  ^.  So  instead  of  ri  — »  r^,  we  have  ri  =  f^ra].  But  even  though  6  —>  1,  if  Tq 
is  close  to  one,  it  may  be  that  \Ora\  ^  ['"al .  so  that  we  cannot  just  cite  the  previous  theorem. 

1/2 

The  corollary  is  still  true,  however.  We  defined  Ti  as  a  minimum  integer  satisfying  F(l;n,m,ri)  <  r^  . 
In  the  proofs,  the  only  properties  we  used  of  the  square  root  function  are  that  x^'^  — »  oo  and  x  > '  jx  — ►  0. 
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We  could  as  easily  have  defined  rj  by  the  relation  F(l;  n,  m,  ri)  <  r°  for  any  a  in  the  range  0  <  a  <  1.  The 
corresponding  effect  in  (9)  is  to  change  the  factor  r~^  to  t~^°'^^/^\ 

Assume  then  that  we  have  defined  r[  and  r"  in  the  same  way  as  ti  was  defined,  but  using  a  =  0.4  and 
a  =  0.6  respectively  instead  of  a  =  0.5.  The  limit  theorem  now  holds  for  both  r[  and  r".  For  m  sufficiently 
large,  since  ^  — ►  1,  we  have  r°'^  <  Va/O  <  r\'^. 

Because  of  this,  and  because  of  the  way  in  which  r[  and  r"  are  defined,  for  large  m  we  will  always  have 
'■'i  <  [''ol  ^  ^i'-  Since  the  limit  theorem  is  true  for  both  r[  and  r",  it  is  true  for  r^  as  defined  above.  | 

8  Summary  tind  Conclusions 

This  paper  ha.s  presented  a  new  method  for  analyzing  the  lengths  of  memory  queues  when  the  network 
is  conflict-free.  An  algorithm  based  on  this  method  efficiently  determines  upper  and  lower  bounds  of  the 
queue  length.  We  have  also  analyzed  the  eisymptotic  behavior. 

Our  analysis  indicates  that  the  strategy  of  using  hashing  to  spread  data  across  memory  modules  is  a 
good  one.  If  the  size  of  the  system  is  increased,  whUe  maintaining  a  constant  ratio  of  numbers  of  processors 
to  memories,  then  asymptotically,  the  slowdown  in  performance  from  the  effect  studied  by  this  paper  is 
©(log  7n/ log  log  m).  For  m  and  n  less  than  100,000,  and  A  between  0.25  and  4.0,  the  graphical  data  confirm 
this  growth  rate. 

Although  it  is  worthwhile  to  have  bounds  on  the  value  desired,  this  cannot  be  considered  a  full  solution 
to  the  problem.  A  drawback  of  the  method  shown  here  is  that  it  does  not  allow  us  to  sharpen  the  estimates 
in  a  convenient  manner.  This  is  not  important  for  the  memory  performance  problem  studied  here,  but  it 
could  matter  in  other  applications,  if  more  precise  estimates  were  needed. 

This  paper  has  looked  only  at  systems  where  there  are  independent  data  paths  between  each  processor 
and  memory,  such  as  across  a  crossbar  network.  A  crossbar  requires  n'  crosspoints  to  connect  n  processors 
to  n  memories,  and  it  would  probably  be  too  expensive  for  large  n.  Consequently,  it  would  be  worthwhile 
to  perform  similar  analysis  on  other  network  topologies,  such  as  the  perfect  shuffle,  where  the  size  of  the 
network  is  0(71  log  n).  The  problem  is  more  complex  with  other  network  topologies,  because  there  can  be 
conflicts  within  the  network  as  well  as  at  the  memories.  Kruskal  and  Snir  [1983]  have  in  fact  looked  at 
this  problem  for  the  perfect  shuffle  with  the  same  assumptions  as  were  made  here,  with  processors  making 
independent  accesses  to  memories  and  with  equal  probability  to  access  any  memory.  These  assumptions 
are  not  always  valid,  and  it  would  be  worthwhile  to  analyze  the  same  problem  under  different  assumptions, 
such  as  when  several  of  the  memories  are  more  favored  to  be  referenced.    Pfister  and  Norton  [1985]  have 
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shown  that  an  omega  network  gets  satuiated  under  these  conditions.  Some  of  their  results  were  based  on 
simulation,  and  it  may  be  possible  to  get  numeric  expressions  for  the  effect  they  found  by  using  methods 
like  those  used  here. 
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Appendix 
Proof  of  Lemma  8 

Here  we  prove  Lemma  8.  Suppose  that  the  variables  n,  m,  and  r  are  held  fixed,  and  r  ^^  n/m.  We  must 
show  that  the  function  F[k;  n,  m,  r)  is  log-concave  in  the  variable  k. 

Assuming  k,  n,  m,  and  r  are  all  >  0,  we  first  determine  which  values  of  k  make  F(k;  n,  m,  r)  —  0.  From 
the  definition  (2)  of  F,  this  occurs  only  when  at  least  one  of  the  factors  ('^),  [n\kT,  or  m  —  fc  is  equal  to  zero, 
unless  m  —  A;=n  —  A:r  =  0,  which  is  excluded  by  hypothesis.  Apart  from  this  exceptional  case,  the  function 
is  zero  when  fc  <  0,  or  A;  >  m,  or  Air  >  n.  Consequently,  F(k\n,m,r)  is  positive  if  and  only  if 

fc  >  0,  m-k>\,  and     n  -  itr  >  0.  (10) 

In  addition,  the  recurrence  is  used  only  for  values  of  r  >  1. 

It  turns  out  that  the  proof  we  are  about  to  give  does  not  work  for  the  case  m  —  k  =  n  —  kr  =  Q,  and  this 
is  why  we  assume  r  ^  n/m  in  the  statement  of  the  lemma.  It  can  be  shown  that  F(k\  n,  m,  r)  is  unimodal 
with  k  for  this  case,  although  not  necessarily  log-concave.  Since  we  do  not  need  this  result,  however,  we  do 
not  give  the  proof.  Rather,  we  assume  the  arguments  of  the  function  satisfy  (10),  and  show  that  F  is  then 
log-concave  in  k.  We  must  show 

F{k;  n,  m,  rf  >  F(k  -  1;  n,  m,  r)F{k  +  1;  n,  m,  r)  (11) 

in  those  cases  where  both  sides  of  the  inequality  are  positive.  Using  (10)  for  F(k  —  l;n,  m,  r)  and  F(k  + 
1;  n,  m,  r),  we  see  that  we  can  limit  ourselves  to  k  in  the  range 

jfc  >  1,  m-ifc>2,  andn-fcr>r.  (12) 
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Working  with  k  in  this  range,  divide  (11)  through  by  its  right  hand  side,  and  call  the  resulting  expression 
"X".  We  do  not  explicitly  show  the  dependence  of  X  on  k,  n,  m,  and  r.  We  must  show 

^  ^  F{k-n,m,r)^ ^  ^_ 

F{k  -  l;n,m,r)F(k+ l;n,m,r)  ~ 

We  now  use  the  definition  (2)  of  F{k;  n,  m,  r)  to  get  an  expression  for  X: 

'  m  —  k  ~ 


X  = 


.\  n-krl 


fc!(m- fc)!(n-A;r)!m*'"(r!)'=  V     "^     /  I 

(A:- l)!(m-fc  +  l)!(7i-ifc7-  +  T-)!m*'-'"(r!)'=-i   /         -m         \n-kr+r 


{k_+ l)\{Tn- k- 1)1  (n-kr-ry.m'''+^{r\)''+^  (        m 


m  —  k  +  \ 

n—kr—T 


m\  n\  \Tn  —  A;  —  1 

{k+\){m-k-^\){n-kr  +  r)\[n-kT-r)\  (     [m  -  kf     x""*^-'/     m      '      n  2'- 


k  (m-k)  [{n-kr)\\'  \{m  -  k)^  -  \ )  \m  -  k  +  \  ^ 

We  want  to  show  that  this  is  always  >  1  for  (fc,n,  tti,  r)  satisfying  the  conditions  shown  above.  The 
problem  is  that  the  dependence  of  X  on  these  four  variables  is  not  at  all  simple.  If  we  increase  or  decrease 
any  one  of  them,  then  some  factor  increases  and  another  decreases. 

As  a  first  step  in  separating  the  dependence  of  X  on  the  variables,  notice  that  m  always  occurs  as  part 
of  the  expression  m  —  k,  and  n  occurs  as  n  —  kr.  The  conditions  (12)  are  also  expressed  in  terms  of  m  —  A; 
and  n  —  kr,  so  we  make  the  substitutions  M  =  m  —  A;  >  2  and  N  =  n  —  kr  >  r.  X  then  becomes: 

(k+l){M  +  l){N  +  Ty.(N~ry.  f     M'     Y~^  (     ^     V^  (u) 

k  M  (iV!)2  \M^-l)  \M  +  l)     ■  ^     ' 

We  now  look  for  the  value  of  N ,  in  terms  of  A:,  r,  and  M ,  which  minimizes  X.  In  particular,  we  see 
what  happens  to  X  when  N  is  increased  by  one. 

X\n+i  _(N  +  l+ry.{N  +  l-ry.  (iV!)-  /     M" 


X\n  [{N  +  iy.]-  (N  +ry.{N  -ry\M^-l 

{<  1     if  (Ar+  l)/r  <  M; 
=  1     if(iV+l)A  =  M; 
>1      if  (iV+  l)/r  >  M. 
Thus,  X  is  minimized  if  iV  =  rM  —  1  oi  N  =  rM.  Choose  the  latter,  and  plug  into  (13). 
(fc  +  l)(M+l)[r(M+l)]![r(M-l)]!  /     M^     ^^jM-,),     ^     y.r 

-     k       M  [(rMyy-  \M''-i)         \M  +  iJ    ■  ^   ' 

For  r  =  1,  this  becomes 

(fc  +  l)(M  +  l)(M  +  l)  /     M^     \^-^  /     M     y  ^  k  +  1  /     M'-     V"-' 
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Thus,  the  statement  is  proved  for  r  —  1.    Assume  now  that  r  >  2.    We  use  the  following  version  of 
Stirling's  formula,  a  stronger  form  of  which  appears  in  Mitrinovic  [1970],  pp.  181  ff.: 

V2xnn"e""  <  n!  <  V27rnn"e~"  exp . 

1271 

When  this  formula  is  used  in  (14),  the  factors  of  the  form  (25r)^/^e~"  cancel  in  the  numerator  and  denomi- 
nator. In  the  first  formula  that  now  follows,  all  factors  from  (14)  except  the  approximation  to  the  factorial 
functions  appear  on  the  first  line.  We  are  left  with  the  following: 

^      (fc  +  l)(M  +  l)  f    M^     y(^-i)^     M    ^'' 


M       \M^-\)  \M+\ 

T^M^-\)V^'^-'^^'I^  (t{M  +  \)\^'         (        2 


exp 


r^M"^       )  \      tM      j         "^  \     \2tM 

ikf+l  /M^- 1\^^^  -1         (M'^^-IM^ -2M -\\'''^  -1 


By  taking  logarithms,  and  using  the  fact  that  r  >  2  and  Af  >  2,  it  is  easily  verified  that  this  last 
expression  is  >  1.  The  argument  of  the  exponential  function  is  increasing  in  r,  so  its  logarithm  can  be 
estimated  by  evaluating  the  argument  at  r  =  2.  The  logarithm  of  the  other  factor  is  estimated  by  using  the 
following  inequality  from  Mitrinovic  [1970],  p.  273: 

0<2/<x^log->  2^^—^. 
y         x  +  y 

Since  Af  >  2,  there  are  also  the  obvious  relations  M  +  1/2  <  M^/2  and  M"  -  1  <  Af^/2  .  Putting  this  in 
(15)  gives 

1  2M^  -2M  -1  1 


log  X  >  -  X  2  X 


2  2M ■*  +  2Af 3  -  2Af  -  1       6rM 

1        M^~M -1/2  1 


M        M^  +  M--l        12M 
1        M^  -  M^/2  1      _     1 

^  M  ^  M^  +  M3/2  ~  12M  ~  4M  ^ 

This  shows  that  X  >  1,  which  means  that  F{k;  n,  m,  r)  is  log-concave  in  the  variable  k.  | 
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Low  and  High  Expected  Queue  Length,  Lambda  =  0.25,  1.0,  4.0 
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